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Abstract 

We study and solve the problem of classical channel simulation with quantum side infor- 
mation at the receiver. This is a generalization of both the classical reverse Shannon theorem, 
and the classical-quantum Slepian-Wolf problem. The optimal noiseless communication rate 
is found to be reduced from the mutual information between the channel input and output by 
the Holevo information between the channel output and the quantum side information. 

Our main theorem has two important corollaries. The first is a quantum generalization of 
the Wyner-Ziv problem: rate-distortion theory with quantum side information. The second is 
an alternative proof of the trade-off between classical communication and common randomness 
distilled from a quantum state. 

The fully quantum generalization of the problem considered is quantum state redistribution. 
Here the sender and receiver share a mixed quantum state and the sender wants to transfer 
part of her state to the receiver using entanglement and quantum communication. We present 
outer and inner bounds on the achievable rate pairs. 

1 Introduction 

In his seminal 1948 paper [23. Shannon introduced the problem of data compression. He found 
that a memoryless source consisting of a large number n of symbols generated according to a 
probability distribution p can be compressed without loss at a rate of H (p) bits per symbol, where 
H(p) is the Shannon entropy of p. This result can be rephrased as a communication problem. 
The sender Alice wants to communicate her source to the receiver Bob. Equivalently, she wants to 
simulate a noiseless bit channel (which we denote by id) from her to Bob with respect to the input 
p. She can accomplish this task by using up a rate H(jp) of perfect bit channels (which we denote 
by [c — > c]) from her to Bob. The protocol consists of Alice sending the compressed source and 
Bob performing decompression upon receipt. The existence of such a protocol may be succinctly 
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Figure 1: The relation of our results to prior work. 

expressed as a resource inequality |1U1 1151 ITT] 

H(p) [c^c}> (\d:p). 

The non-local resource on the left hand side can be composed with local pre- and post-processing 
to simulate the non-local resource on the right hand side. 

With this viewpoint in mind, Shannon's result was generalized some 50 years later to simulating 
noisy channels. The latter result was dubbed the reverse Shannon theorem |SJ|27j, referring to 
Shannon's noisy channel coding theorem |24| . One may well ask why one should be interested in 
simulating noise. The reason is a saving in resources: part of the classical communication [c — > c] 
can be replaced by shared coins or "common randomness" (denoted by [c c]). Common randomness 
is a strictly weaker resource than classical communication because Alice can flip her coin locally 
and send the outcome to Bob. The reverse Shannon theorem is intimately related to lossy 
compression, or rate-distortion theory jS], where the communication rate is traded off against a 
suitably defined distortion level of the data. More generally, the reverse Shannon theorem is a 
useful tool for effecting trade-offs between resources 0] . 

Another generalization of Shannon's result, introduced by Slepian and Wolf |2S|, is to give Bob 
side information about source. The case of quantum side information was considered in |14j . 

In this paper we combine the two ideas of making the channel noisy and allowing quantum 
side information with the receiver. We also analyze several consequences for trade-offs. The first 
is rate-distortion theory with quantum side information paralleling the classical work of Wyner 
and Ziv • The second is an alternative derivation of a result from \T^\ concerning distillation 
of common randomness from a bipartite quantum state with the assistance of one-way classical 
communication. The various implications of our result are shown in Figure 1. 

This paper is organized as follows. In Section 2 we introduce the notation and give some 
background. Section 3 contains our main result, Theorem 13. II together with its proof. Section 4 
discusses consequences of Theorem 13.11 In section 5 we find outer and inner bounds for a fully 
quantum version of our problem. Section 6 concludes with a discussion and proposed future work. 
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2 Notation 



Let us introduce some useful notation for the bipartite classical-quantum systems. The state of a 
classical-quantum system XB can be described by an ensemble £ — {p B ,p(x)}, with p(x) defined 
on X and the p B being density operators on the Hilbert space Tis of B. Thus, with probability p(x) 
the classical index and quantum state take on values x and p B , respectively. A useful representation 
of classical-quantum systems is obtained by embedding the random variable X in some quantum 
system, also labelled by X. Then our ensemble {p B ,p(x)} corresponds to the density operator 

p XB = "£p(x)\x)(x\ x ®p*, (1) 

X 

where {\x) : x S X} is an orthonormal basis for the Hilbert space TCx of X. A classical-quantum 
system may, therefore, be viewed as a special case of a quantum one. The von Neumann entropy 
of a quantum system A with density operator a A is defined as H(A) a = — Tr a A log a A . The 
subscript is often omitted. For a tripartite quantum system ABC in some state a ABC define the 
conditional von Neumann entropy 

H(B\A) = H(AB) - H(A), 

quantum mutual information 

I(A; B) = H(A) + H{B) - H(AB) = H{B) - H(B\A), 

and quantum conditional mutual information 

I(A; B\C) = I (A; BC) - I (A; C). 

For classical-quantum correlations £Q) the von Neumann entropy H(X) p is just the Shannon en- 
tropy H(X) = —^2 x p{x)\ogp(x) of the random variable X. The conditional entropy H(B\X) 
equals ^ x p{x)H(p B ). The mutual information I(X; B) is the Holevo quantity J^I 01 the ensem- 
ble £: 

X {£) - H I Yv{x)p x ] - Yv(x)H(p x ). 



h\y,v{x)pA -£ 

\ X / X 



Finally we need to introduce a classical-quantum analogue of a Markov chain. We may define a 
classical-quantum Markov chain Y^X -^B associated with an ensemble {p B y ,p(x, y)} for which 
Pxy = Px i s independent of y. Such an object typically comes about by augmenting the system XB 
by the random variable Y (classically) correlated with X via a conditional distribution PF(y|.x) = 
Pr{Y = y\X = x}. This corresponds to the state 

P XYB = X>0*) £ W(y\x)\y)(y\ Y ® \x)(x\ x ® p B . (2) 
x y 

Here ^(yja;) is the noisy channel and X and Y are input and output random variables. Therefore 
the classical-quantum system YB can be expressed as 

p yb ^Y,^y)\y)^ Y ®py ( 3 ) 



with q( y ) = J2xP( x ) w (y\ x ) and p b = J2x p ( x \y)Px 
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3 Channel simulation with quantum side information 



Consider a classical-quantum system XB in the state such that the sender Alice possesses the 
classical index X and the receiver Bob has the quantum system B. Consider a classical channel 
from Alice to Bob given by the conditional probability distribution W . Applying this channel to 
the X part of p XB results in the state p XYB given by (j2J- Ideally, we are interested in simulating 
the channel W using noiseless communication and common randomness, in the sense that the 
simulation produces the state p XYB . For reasons we will discuss later, we want Alice to also get 
a copy Y of the output, so that the final state produced is 

p xyyb = J2 P ( X ) £ W(y\x)\y)(y\ Y ® \y)(y\ Y ® \x)(x\ x ® p B . (4) 

x y 

The systems X and Y are in Alice's possession, while Bob has B and Y. 

As usual in information theory, this task is amenable to analysis when we go to the approximate, 
asymptotic i.i.d. (independent, identically distributed) setting. This means that Alice and Bob 
share n copies of the classical-quantum system XB, given by the state 

pX „ B n = J2p^)\ X ^ x n\X- 3 ^ (5) 

x n 

where x n = x\ . .. x n is a sequence in X n , p n {x n ) = p{x{) . . .p(x n ), and p x n = p Xl ® p X2 ■ •• ® p Xn , 
They want to simulate the channel W n (y n \x n ) = W(yi|xi) . . . W(y n \x n ) approximately, with error 
approaching zero as n — > oo. They have access to a rate of C bits/copy of common randomness, 
which means that they have the same string / picked uniformly at random from the set {0, l} nC . 
In addition, they are allowed a rate of R bits/copy of classical communication, so that Alice may 
send an arbitrary string m from the set {0, \} nR to Bob. 
An (n, R, C, e) simulation code consists of 

• An encoding stochastic map E n : X n x {0, l} nC -> {0, l} nR x {0, l} nS . If the value of the 
common randomness is / S {0, l} nC , Alice encodes her classical message x n as the index ms, 
m E {0, l} nR , s £ {0, l} nS , with probability E[(m, s\x n ) := E n (m, s\x n , I), and only sends m 
to Bob; 

• A set {A( im )} ime{01} „ ( c+«), where each = {A^ m) } s , e{0!l} „s is a POVM acting on B n 
and taking on values s' . Bob does not get sent the true value of s and needs to infer it from 
the POVM; 

• A deterministic decoding map D n : {0, l} nC x {0, l} nR x {0, l} nS -> y n ; this allows Alice 
and Bob to produce their respective simulated outputs y n — Di(m,s) := D n (l,m, s) and 
y n = Di(m, s'), based on I, m and s (in Bob's case s'); 

such that 

Here the state a x Y Y B denotes the result of the simulation, which includes Alice's original 
X n , the post-measurement system B n , Alice's simulation output random variable Y n and Bob's 
simulation output random variable Y n (based on s'). 
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Figure 2: Achievable region of rate pairs for a classical-quantum system XB. 



A rate pair (R, C) is called achievable if for all e > 0, 5 > and sufficiently large n, there exists 
an (n, R + S, C + 5, e) code. 

We now state our main theorem. 

Theorem 3.1 The region of achievable (R,C) pairs is given by 



The theorem contains a direct coding part (achievability) and a converse part (optimality). 
For the direct coding theorem it suffices to prove the achievability of the rate pair (R, C) — 
(I(X;Y) - I(Y;B),H(Y\X)). The full region given by Theorem 1531 (see Figure 2) follows by 
observing that a bit of common randomness may be generated from a bit of communication. 

A naive simulation would be for Alice to actually perform the channel W locally and send a 
compressed instance of the output to Bob. This would require a communication rate of H(Y) bits 
per copy. The first idea is to split this information into an intrinsic and extrinsic part |28| . The 
extrinsic part has rate H(Y\X) and is provided by the common randomness. Only the intrinsic 
part I(X;Y) = H(Y) — H(Y\X) requires classical communication. This protocol would amount 
to sending the strings m and s above. However, a further savings of I(Y; B) is accomplished by 
Bob deducing the s index from his quantum state. Thus Alice need only send m which requires a 



For the direct coding part we will need several lemmas. The first one is the Chernoff bound 



Lemma 3.2 (Chernoff bound) Let Z%, . . . , Z n be i.i.d. random variables with mean [i. Define 



The second lemma concerns deterministically "diluting" a uniformly distributed random vari- 
able to a non-uniform one on a larger set. We will need it to create y n from I, m and s. 



R>I(X;Y)-I(Y;B), 



C + R> H(Y\B). 



rate I(X; Y) — I(Y; B). 



(cf. 0). 
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Lemma 3.3 (Randomness dilution) We are given a probability distribution q(y) defined on y 
and a set T C y such that 

g(T):=]Tg(y)>l-e, (8) 

q(y) >a, Vy e T, (9) 

for some positive numbers a and e. Let W be the random variable uniformly distributed on 
{1, ...,M}. For random variables Yi, Yz, Ym oil distributed according to q, define the map 
G : {1, M} — > y by G(i) = Yi. Then, letting q be the distribution of G(W), 

Pr{||g - > V + e} < 2|T| exp(-KoMQ!77 2 ) 

for some constant Kq. 

Proof Consider the indicator function I{G(i) = y) taking values in {0, 1}. Observe that I{G(i) = 
y) for i £ {1, ...,M} are i.i.d. random variables with expectation value E/(G(i) = y) = q(y). The 
distribution q(y) of G(W) is i ^fii / ( G (*) = v)- % thc Chernoff bound for each y e T, 



for r\ < i and some constant kq, 



Pr 

By the union bound, 



I i=l 



> q(y)v ? < 2exp(-n Mar) 2 ). (10) 



Pr{not l} < 2\T\e?q>(-K Mar) 2 ), 
where the logic statement 1 is given by 

L = {qe [q(l-r)),q(l+ri)]} 

and q(y) = q(y)I(y G T). It remains to relate t to a statement about \\q — q\\±. First observe that 

ll?-9lli = X>(v)-?(y)l 

Second, observe that l implies \\q — q\\\ < n. The two give, via the triangle inequality 

Ik -9l|i <V + e. 

The statement of the lemma follows. □ 



Corollary 3.4 Consider a random variable Y with distribution q{y), and let W be the random 
variable uniformly distributed on {1, ...,M}. For random variables Yi,Y%, ■■■,Ym a ^ distributed 
according to q n , define the map G : {1, ...,M} — > y n by G(i) — Yi. Let q be the distribution of 
G(W). Then, for all e, 8 > and sufficiently large n, 

Pr{||<f - > 2e} < 2jexp(- Ko Me 2 /j), 

where 7 = 2™[- ff ( y )+ c<5 l and c is some positive constant. 



G 



Proof We will assume familiarity with the properties of typicality and conditional typicality, 



collected in the Appendix. We can relate to Lemma 13.31 through the identifications: y — > y n , 
<l(y) — * Q n (y n )i an d T — ¥ ^ys- The ^ w0 conditions now read 

q n (T^ s )>l-e, (12) 

q n {y n )>T\ Vy n £TY,s- (13) 



These follow from properties 1 and 2 of Theorem lA.il (relabeling X to Y and p to q). 

□ 

Our next lemma contains the crucial ingredient of the direct coding theorem and is based on 
|28|. It will tell us how to define the encoding and decoding operations for a particular value of 
the common randomness. 

Lemma 3.5 (Covering lemma) We are given a probability distribution q(y) and a conditional 
probability distribution P(x\y), with x G X and y G y. Assume the existence of sets T C X and 
{%,)yey C X with the following properties for all y G y: 

Y / q(y)P(T y \y) > i-e, (14) 

yey 

Y,q(y)P(T\y) > i-e, (15) 

yey 

\T\ < K, (16) 
P{x\y) < k-\ Vx <E T y . (17) 

Define M = l^K/k] for some < 1] < 1. Given random variables Yi, Y2, Ym all distributed 
according to q, define the map D : {1, 2, M} — > y by D(i) = Yi. Then there exists a conditional 
probability distribution E(i\x) defined for i G {1,2, ...,M} such that 

Pt{\\Pu - Ep\\i > 5e} < 2Kexp(-n Q e 3 /ri), (18) 

where P{x\i) — P(x\D(i)), u is the uniform distribution on {1,2,...,M} and p is the marginal 
distribution defined by p(x) = ^2 y£ y P(x\y)q(y) . 

Remark The meaning of the covering lemma is illustrated in Figure 3. A uniform distribution 
on the set {1, 2, M} is diluted via the map D to the set y, and then stochastically mapped to 
the set X via P(x\y). Condition i|18|) says that the very same distribution on {1,2, ...,M} x X 
can be obtained by starting with the marginal p(x) and stochastically "concentrating" it to the set 
{1, 2, M}. For this to be possible, the conditional outputs of the channel P(x\y) (for particular 
values of y) should be sufficiently spread out to cover the support of p(x). Each conditional output 
random variable is supported on T y (|14|) of cardinality roughly > k (|17ll . and p{x) is supported 
on T (|15fl of cardinality < K l|16f) . Thus roughly M « K/k conditional random variables P(x\i) 
should suffice for the covering. 

Proof The idea is to use the Chernoff bound, as in the proof of the randomness dilution lemma. 
First we trim our conditional distributions to make them fit the conditions of the Chernoff bound; 
the resulting bound is then related to the condition (fT%|l . 
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Figure 3: The covering lemma. 



Define 



w 



(as) = Yl 9(v)P(x\y)I(x £ Ay), 



with A y = T y f]T and A = [j y ey "V B Y properties iJUJl and JTSJ, to(.A) = X) v ey ^(^(Aly) ^ 
1 - 2e. Further define S y = A y f){x : w(x) > e/K} and B — \J ye y B y . Then define 

P(x\y) = P(x\y)I(x e B y ), w(x) = £ q(y)?(x\y) = w(x)I(w(x) > e/K). 

By <|16[) . the cardinality of A is upper-bounded by K, those a; S A with w(a;) smaller than e/K 
contribute at most e to w(A). Thus 



> w(A) - e > 1 - 3e. 



(19) 



Observe 



EP(x|£>(i)) = w(x) > e/if. 

By (|T7|) . < P(x|Z3(i)) < We can now apply the Chernoff bound (|3.2|) to the i.i.d. random 
variables P(x\D(i)) (for fixed 




(20) 



< 2exp(— Koe 3 /r)). 



Hence 



Pr{not l} < 2ifexp(-f£ e 3 /i?) 



(21) 



where the logic statement i is defined as 
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Assume that i holds. Then we can define our conditional distribution E as 

E(ilx) 1 P(*\D(i)) 

E{Ax) -{l + e)M p(x) ■ 

By l and the definition of w, we can check E(i\x) is a subnormalized conditional distribution, 

M M 



y Em = y 1 mm < m < L 



Finally, we estimate ||Pu — J5p||i. It is sufficient to do this for the constructed subnormalized 
conditional distribution, because we can distribute the rest weight to fill up to 1 arbitrarily. The 
joint distribution of Pu is {jjP(x\D(i))}, thus 



M 1 / 1 \ M 1 

\pu-E P \\ 1 = y y -(i- \ P ( X \D(i))+y y -p(x\ D m ,221 



Since P{Buii)\D(i)) < 1, we can bound the first term by e. By assumption, 

1 M ~ 
iefi i=l 

Since Bd(j) C £>, the second term in (1221 is bounded by 4e. We have now shown that if i holds 
true then 

\\Pu- EpWx < he. 

Combining with (|21|1 proves the theorem. □ 

Corollary 3.6 Consider the joint random variable XY distributed according to q(y)P(x\y). Given 
random variables Yi, Y2, •••■> Ym a ^ distributed according to q n , define the map D : {1,2,..., A/} — » 
y n by D(i) — Yi. Then, for all e, S > and sufficiently large n, there exists a conditional probability 
distribution E(i\x n ) defined for i e {1, 2, M} such that 

Pt{\\Pu - £p n ||i > 5e} < 2aexp(-K Me 3 / 3/a), (23) 



where P(x n \i) = P n {x n \D(i)), u is the uniform distribution on {1, 2, M}, p is the marginal 



distribution defined by p(x) = J2 v ey p ( x \y)l{v)> a = 2 n ^- H ^+ cS \ (3 = 2 n ^ H( - x \ Y )~ c5 ^ 



Proof We can relate to Lemma 13.51 through the identifications (see Appendix) : X — > X n , 
y -> y n , q(y) - q n (y n ), P{x\y) - P n (x n \y n ), T - T£ 34 , and T y - % Y § {y n ), with 



f x\vAy n ) } i) otherwise. 
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The four conditions now read (for all y n G y n ), 



£ q n {y n )P n {f^ 5 {y n )\y n ) > 1 - 2e, (24) 

X; g n (y")P"(r^3 5 |y")>i-2 £ , (25) 

l^,3«l<«. (26) 

P n (x n \y n ) < (3~\ Vx" G t^y"). (27) 

These follow from Theorem IA.2I switching the roles of X and Y and setting 8 = 6'. □ 



We will also need the Holevo-Schumacher- Westmoreland (HSW) theorem |2U1 [23] . 
Proposition 3.7 (HSW Theorem) Given an ensemble 

yey 

and integer n, consider the encoding map F : {0, l} nS — > y n given by F(s) = Y s , where the 
{Y s } are random variables chosen according to the i.i.d. distribution q n . For any e, 5 > and 
sufficiently large n, there exists a decoding POVM {A s } sg { ,i}" s on B n f or ^ e encoding map F 
with S = I(Y; B) a — 5, such that for all s, 

E£>(«'i*)-<y(*y)l <e. 

s' 

Here ir(s f \s) is the probability of decoding s' conditioned on s having been encoded: 

n(s'\s) = Tr(A s ,p F(s) ), (28) 
5(s, s') is the delta function and the expectation is taken over the random encoding. 

Now we are ready to prove the direct coding theorem: 
Proof of Theorem 1 (direct coding) Fix e, 6 > and a sufficiently large n (cf. Corollaries 
EUEEland Proposition Consider the random variables Yi ms , I G {0, 1}" C , m G {0,l} nR , 
s = {0, l} nS (for some C,R and S to be specified later), independently distributed according to 
q n , where q(y) = ^2 x p{x)W{y\x). The Yi ms are going to serve simultaneously as a "randomness 
dilution code" G(l, m, s) = Y lms (cf. the Y u . . . , Y M in CorollaryEl M here being 2 n( * c+R+s ); as 
2 nC independent "covering codes" Di(m, s) = Yi ms (cf. the Yi, . . . , Ym in Corollary 13.61 M here 
being 2 n ( R+s ); and as 2 n{ - c+R ^ independent HSW codes F lm (s) = Y lms (cf. Proposition EZJ). We 
will conclude the proof by "derandomizing" the code, i.e. showing that a particular realization of 
the random Yi ms exists with suitable properties. 

Define, as in the two corollaries, a = 2 n ^ H ^ +cS \ (3 = 2 n ^ H{ - x \ Y ^- c& \ and 7 = 2 n ^ H ^ +cS l 
Define two independent uniform distributions u'(l) and u{ms) on the sets {0, l} nC and {0, \] nR x 
{0, l} nS , respectively. The stochastic map D(y n \l,m, s) is defined as 

D(y n \l )m ,s) = I(y n = Di(m,s)). 
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Corollarv l3 . 61 defines corresponding encoding stochastic maps {Ei(m, s\x n )}. For any I £ {0, 1}" C , 
define the logic statement ii by < 5e, where 



6 = X] E E P n (x n \y n )D(y n \l, m, s)u(ms) - Ei(m, s\x n )p n (x n ) 
By Corollary 13. 61 for all / 



Pr{not n} < 2aexp(-2" ( ' R+5) Koe 3 /3/a). 
Define the logic statement t' by £' < 2e, where 



(29) 



^ 5(y n |l,ro,*)u'(Z)u(m*)- 9 n (y n ) 



l.rri.s 



By Corollary El 

Pr{not t'} < 2 7 exp(-2 n(c,+i?+s) K oe 2 /7)- 
Once we fix the randomness we shall be using 

W(y n \x n ) = D{y n \l,m,s)E l {m,s\x n )u'{l) 

to simulate the channel W n (y n \x n ). Observe that 
J2 \p n (x n )(W n (y n \x n )-W(y n \x n )) 



(30) 
(31) 

(32) 



E 

x n ,y n 



Lm.s 



+ J2 pn (* n \y 

x n ,y n 

< max£j + 



J2 D(y n \l,m,s)E l (m, S \x n )u'(l)p n (x n )~W n (y n \x n )p n (x n ) 

Ei(m, s\x n )p n (x n ) -J2 P n (x n \y n )D(y n \l, m, s)u(ms) 
y n 

D(y n \l,m,s)u'(l)u(ms)-q n (y n ) 



< E E D(y n \l,m,s)u'(l) 

x™,y n l.m,s 



(33) 



To obtain the first inequality we have used 

D(y n \l,m,s)D(y n \l,m,s) = D(y n \l,m,s)S(y n ,y n ) 

and the triangle inequality. 

We shall now invoke Proposition 13.71 Define q(y)p y = ^2 x p(x)W(y\x)p x - Setting F/ m (s) = 

Yi ms and S = I(Y; B)—c8, there exists a set {Af |m )} lme { 01 }„(c+R) , where each A^ m ^ = {^'}s'e{Q,i} nS 
is a POVM acting on B n , such that 



Ej2\n m (s'\s)-5(s,s')\ <e 



(34) 
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for all l,m and s. 7r; m (s'|s) describes the noise experienced in conveying s to Bob, if the channel 
W n {y n \x n ) were implemented exactly. However, Alice only has the simulation W(y n \x n ), which 
corresponds to the ensemble q(y n )p y ™ '■= ^2 X „ p n {x n )W (y n \x n ) p x ™ . 

Observe that © is another way of expressing \\{p XYB )® n - a X n Y n B n ^ = ^ p XYj®n _ 
a x Y ||i. Applying monotonicity of trace distance to (|33f) . we have 

\\( P YB )® n - ^"Hi = E h n (y n )p y n - W)n»V < max^ + r, 

y n 

and hence by the triangle inequality and monotonicity of trace distance 

n\p F{ s) - pfwIIi < E Wfiv^ir - Q(y n )py-\\i + E fi(y n ) - w)i < 2(max^ + eo- 

Thus, the actual noise experienced in conveying s to Bob, denoted by Tri m (s'\s), obeys E |7r/ m (s'|s) 
7f;m(s'|s)| < 2(max/£; + £')■ Combining the above with 134(1 gives 

E^|?/ m (s'|s)-^,«')| < 2(max6+e') + e- 

Let us focus on the effect this imperfection in the HSW decoding will have on the simulation. By 
monotonicity, 

E E I E 5(» n |/m«)5(^|W)£; I (nM|x n y(0p n (!F n )(7r, t7l (a'|a)-<y(a > a / ))| < 2 (max 

x n y n y n l,m,s,s' 

By the Markov inequality, Prjnot < ^, where t/' is the logic statement 



E 



E 5(tf"|Jm«)D(ir n |im« / )£;i(ma|i n )u / (Z)p n (a; n )(7r, TO («>) - 5{s, s')) 



< 4(max£;+£')+2e. 



Now for the derandomization step. Pick C = H(Y\X) - cS and R = I(X; Y) - I(Y; B) + AcS. 
By the union bound t/ for all I, t', and l" hold true with probability > 0. Hence there exists a 
specific choice of {Yi ms } for which all these conditions are satisfied. Consequently, 



E 

x n y n y r, 



E 



D{y n \lms)D{y m \lms')Ei{ms\x n )u\l)p n {x n ){^i m {s'\s) -S(s,s')) 



< 30e, 



i.e. \\a x Y ° Y — u Y Y ||i < 30e, where Y™ — Y n is Bob's simulation output random variable if 
his decoding measurement is perfect. Combining with 1(33(1 (\\{p XYY )® n — a x Y ° Y 1 1 1 < 7e) gives 

This is almost what we need. The statement of the theorem also insists that the state of the 
B n system is not much perturbed by the measurement. The crucial ingredient ensuring this, as 
in |14|. is the gentle measurement lemma |26j . To improve readability, we omit the details of its 
application here. □ 



Before proving the converse, recall Fannes' inequality |17j : 
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Lemma 3.8 (Fannes' inequality) Let P and Q be probability distributions on a set with finite 
cardinality d, such that \\P — Q\\\ < e. Then \H(P) — H(Q)\ < elogd + r(e), with 




-eloge j/e<l/4, 
1/2 otherwise. 



Note that r is a monotone and concave function and r(e) — > as e — > 0. □ 

Proof of Theorem 5 (converse) Consider an (re, -R, C, e) code. Define the uniform random 
variable U on the set {0, l} nC to denote the common randomness, and W on the set {0, l} nfl to 
denote the encoded message sent to Bob. We have the following Markov chain 

B n WU^ B n Y n . 

The following chain of inequalities holds: 

nR > H(W\U) 

= H(W\U) + I(X n : B n \U) - I(X n ; B n ) 

> I(X n ; B n W\U) - I(X n ; B n ) 
= I(X n ; B n WU) - I(X n ; B n ) 

> I(X n ;B n Y n ) - I(X n ;B n ) 
>n(I(X;BY)-I(X;B)-f(n,e)) 
= n(I(X;Y) - I(Y;B) - f(n,e)) . 

with f(n, e) -> as n -> oo and e -> 0. The second line from 7(X n ; S"|C7) = B"), and the 

fourth from I(X n ; U) = 0. The fifth line is the data processing inequality based on the Markov 
chain above. The sixth is a consequence of Fannes inequality, and the last line is based on the 
Markov chain Y n -> X n -> B". 
Based on the Markov chain 

Y n ^ B n WU^ Y n , 

we have another chain of inequalities : 

nR + nC> H(W) + H(U) 

> H(WU) 

= I(Y n ; B n WU) + I(WU; B n ) + H(WU\Y n B n ) - I(Y n ; B n ) 

> I(Y n ; B n WU) - I(Y n ; B n ) 

> I(Y n ;Y n ) - I(Y n ;B n ) 
>n(H(Y)-I(Y;B)- f'(n,e)) 

with f'(n,e) — » as n — ► oo and e — > 0. The last two inequalities are from the data processing 
inequality and Fannes inequality. Thus any achievable rate pair (i?, C) must obey the conditions 
of Theorem IO 

□ 
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We can use the theory of resource inequalities |10j to succinctly express our main result. In 
this case wc need to introduce an additional protagonist, the Source, which starts the protocol by 
distributing the state 

p XsS =YA*)\x)(x\ Xs ® Px, 



between Alice and Bob. Alice gets X$ through the classical identity channel id Xs Xa and Bob 
gets S through the quantum identity channel id s ^ s . The goal is for Alice and Bob to end up 
sharing the state 

a x A Y A Y B B = ^2p( x )^2w(y\x)\y)(y\ YA <Z) \y)(y\ Ys ®\x)(x\ Xa ®p B , (35) 

x y 

as if p XsS was sent through the channel W Xs ~* YaYb (8) id 5 ^ 5 (the former is a feedback version 
of W) . Our direct coding theorem is equivalent to the resource inequality 

UYn \X A )« Feel 

(36) 



{id Xs ^ XA ®id s ^ B :p XsS ) + {I(X A ;Y B ) a -I(Y B :B) a )[c -» c] + H(Y B \X A ) a [cc] 



> (W Xs ^ YAY -®id s - B :p XsS ). 
The superscript s stands for "source" and is a technical subtlety [TU] . 



4 Applications 

In this section, common randomness distillation and rate-distortion coding with side information 
will be seen as simple corollaries of our main result. 



4.1 Common randomness distillation 

Alice and Bob share n copies of a bipartite classical-quantum state 

P XaB = ^2p(x)\x)(x\ Xa ®P*, 

x 

and Alice is allowed a rate R bits of classical communication to Bob. Their goal is to distill a rate 
C of common randomness (CR). In terms of resource inequalities, a CR-rate pair (C, R) is said to 
be achievable iff 

(p XaB ) + R[c -> c] >C[cc]. 
Define the CR-rate function C(R) to be 

C(R) = sup{C : (C*, R) is achievable}, 
and the distillablc CR function as D(R) — C(R) — R. The following theorem was proved in |15) . 
Theorem 4.1 Given the classical- quantum system XB , then 

D(R) = max{/(Y"; B) | I(X; Y) - I(Y; B) < R}. 

Y\X 

where C(R) = C*(R) = R + D*(R). The maximum is over all conditional probability distributions 
W(y\x) with \y\ < \X\ + 1. 
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We give below a concise proof of the direct coding part of this theorem, relying on our main 
result 1)36)1 and the resource calculus 

Proof We need to prove 

( P XaB ) + {I{X A ;Y B ) a -I{Y B -B) a )[c -► c] > I(X A ;Y B ) a [cc], (37) 
with (j XaYaYbB given by 1)35)1 . Observe the following string of resource inequalities: 

(I& Xs ^ Xa ® id 5 ^ s : + (/pa ; y B ) CT -I(y s ;S) CT )[c - c] + H(Y B \X A ) a [cc] 

> (^-^^ . p ^ S ) 

>i/(y B ) CT [cc]. 

The first inequality is by 1)361) and Lemma 4.11 of JOj which allows us to drop the s superscript; 
the second and third are by parts 5 and 2, respectively, of Lemma 4.1 of ^U]. The last inequality 
is common randomness concentration JOj, which states that ((j YaYb ) > H(Y B ) a [cc]. By Lemma 

4.10 of QH|, {Td Xs ^ XA (E)id s ^ B :p XsS ) can be replaced by 

( p XaB ) = (id Xs ^ XA ® id s ^ s ( P ^ s )). (38) 
Thus by (|38|l and Lemma 4.6 of JO], we have 

( P XaB ) + (I(X A ;Y B ) a -I(Y B :B) a )[c -» c]+o[ CC ] > 7(X A ; r B ) CT [cc]. 
Since [c — > c] > [cc], by Lemma 4.5 of |l()j the o term can be dropped, and 1)37(1 is proved. □ 

4.2 Rate-distortion trade-off with quantum side information 

Rate-distortion theory, or lossy source coding, is a major subfield of classical information theory 
0. When insufficient storage space is available, one has to compress a source beyond the Shannon 
entropy. By the converse to Shannon's compression theorem, this means that the reproduction of 
the source (after compression and decompression) suffers a certain amount of distortion compared 
to the original. The goal of rate-distortion theory is to minimize a suitably defined distortion 
measure for a given desired compression rate. Formally, a distortion measure is a mapping d : 
X x X — > R + from the set of source-reproduction alphabet pairs into the set of non-negative real 
numbers. This function can be extended to sequences X n x X n by letting 

1 " 

d(x n ,x n ) = -Vd(a;„i,). 
n ' 

i=i 

We consider here a quantum generalization of the classical Wyner-Ziv problem. The 
encoder Alice and decoder Bob share n copies of the classical-quantum system XB in the state 
Alice sends Bob a classical message at rate R, based on which, and with the help of his 
side information B n , Bob needs to reproduce x n with lowest possible distortion. An (n,R,d) 
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rate-distortion code is given by an encoding map £ n : X n — ► {0,1}™^ and a decoding map T> n 
which takes £ n (x n ) and the state p x ™ as inputs and outputs a string x n £ X n . V n is implemented 
by performing a £ n (a; n )-dependent measurement, followed by a function mapping £ n (x n ) and the 
measurement outcome to x n . The condition on the reproduction quality is 

d(£ n ,V n ) :=Ed(X n ,X n )=^2p n (x n )d(x n ,V n (£ n (x n ),p xn )) < d . 

A pair (R, d) is achievable if there exists an (n, R + S, d) code for any 6 > and sufficiently large 
n. Define i?s(<i) to be the infimum of rates R for which (R, d) is achievable. 

Theorem 4.2 Given n copies of a classical-quantum system XB in the state p x B , then 

R B (d)= lim R&fd), 

n—>oo 

RnHd) = - min min (I(X n - Y) - I(Y; B n )) 

Tl Y\X» V:YB"^X™ 

where the minimization is over all conditional probability distributions W(y\x"), and decoding maps 
V : YB n X n , such that 

Ed(X n ,V{Y,B n )) = ^2p n (x n )W(y\x n )d(x n ,V(y,p^)) < d. 

x n ,y 

Note that (m + n)R^ l+n \d) < mR^ l \d) + nR^(d). By arguments similar to those for the 
channel capacity (see e.g. |3] , Appendix A) , the limit Rsid) exists. However, the formula of R^ (d) 
is a "regularized" form, so Rs{d) can not be effectively computed. 

We omit the easy proof of the converse theorem. The direct coding theorem is an immediate 
consequence of Theorem 13.11 (cf. |27j): 

Proof of Theorem 4.2 (direct coding) It suffices to prove the achievability of Rg (d), for 
a fixed channel |cc) and decoding map T> : YB — > X. Consider an (n, R, C, e) simulation code 
for the channel W(y|a:). The simulated state a x Y B can be written as a convex combination of 
simulations corresponding to particular values of the common randomness I: 

i 

In other words, obtained from the encoding Ei(m, s\x n ), POVM set {A(' m )} me{04} „c, 

and decoding Di(m, s). From the condition for successful simulation @ and monotonicity of trace 
distance it follows that 

\\^u'{l)V® n {at nBn )-V® n {p YB )® n \\ 1 < e. (39) 
I 

For each I define rate-distortion encoding £ l n by Ei{m, s\x n ), and decoding V l n by the POVM 
set {A(' m )} m6{01} „c followed by Di(m,s') (s' is the POVM outcome) and V® n . Invoking 
Kd(X, T>(Y, B)) < d and the linearity of the distortion measure, gives 

^u'(l)d(£ l n ,V l n )<d+coe, 
i 



l(i 



for some constant cq. Hence there exists a particular I for which 



d(£ l n ,V l n ) <d + c e. 

The direct coding theorem now follows from the achievable rates given by Theorem 13. II □ 

The classical Wyner-Ziv problem is recovered by making B into a classical system Z, i.e. by 
setting p x — ^2 z p(z\x)\z)(z\ with ^2 z p(z\x) = 1 and associating the joint distribution p(x)p(z\x) 
with the random variable XZ. In this case a single-letter formula is obtained 

R z {d) = R { z\d) = min min (/(A; Y) - I(Y; Z)) . 

Y\X D-.YZ^X 

It is an open question whether a single-letter formula exists for Rs(d). Following the standard 
converse proof of [7||25] we are able to produce a single letter lower bound on -Rb(cT) given by 

R% (d) = min min _ (/(A; C) - I(C; B)) , 

where C is now a quantum system (replacing Y) and W : X — > C is a classical-quantum channel 
(replacing W). Unfortunately, R* B (d) appears not to be achievable without entanglement. For 
instance, in the d — and B = null case, simulating the channel X — > C with a rate of I(X; C) bits 
of communication generally requires H{C) ebits 0]. Since entanglement cannot be "derandomized" 
like common randomness, a coding theorem paralleling that of Theorem 4.2 seems unlikely. 



5 Bounds on quantum state redistribution 

Our channel simulation with side information result, Theorem 13.11 is only partly quantum. To 
formulate a fully quantum version of it, we (i) replace the classical channel W by a quantum 
feedback channel |S] U A ^ BA , which is an isometry from Alice's system A to the system BA shared 
by Alice and Bob; (ii) replace the classical-quantum state p XB by a pure state \ip} RAB shared 
among the reference system, Alice and Bob. Sending the A part of \ip) RAB through the channel U 
results in the state 

\^) RABB = U\ V ) RAB , 

where A is held by Alice and BB is held by Bob. Because U is an isometry, the state \ip) RAB is 
equivalent to \ip) RABB with AB in Alice's possession. Thus simulating the channel U on \ip) RAB 
is equivalent to quantum state redistribution: Alice transferring the B part of her system AB to 
Bob. We can now ask about the trade-off between qubit channels [q — > q] and ebits [qq] needed to 
effect quantum state redistribution. In terms of resource inequalities, we are interested in the rate 
pairs (Q, E) such that 

(U^ AB :p S ) + Q[q^q]+E[qq] 

(40) 

> (U^ AAB : p s ). 

Here U\ is an isometry such that \ip) RAB = Ui\<p) RS , \<fi) RS is a purification of p s , and Ui — U oU\. 

We can find two rather trivial inner bounds (i.e. achievable rate pairs) based on previous results. 
First let us focus on making use of Bob's side information B. The feedback channel simulation will 
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be performed naively: Alice will implement \j A ^ AB locally and then "merge" her system B with 
Bob's system B, treating A as part of the reference system R. This gives an achievable rate pair of 
(Qx,Ei) = {\I{B; RA), ~\I{B- B)) by the fully quantum Slepian-Wolf (FQSW) protocol PE], a 
generalization of |21| . The negative value of E means that entanglement is generated, rather than 
consumed. 

Now let us ignore the side information and focus on performing the channel simulation non- 
trivially. This is the domain of the fully quantum reverse Shannon (FQRS) theorem [Tl I§1 IT2]. 
Treating B as part of the reference system R, the FQRS theorem implies an achievable rate pair 
of (Q 2 ,E 2 ) = (±I(B;RB),±I(B;A)). 

An outer bound is given by the following proposition. 
Proposition 5.1 The region in the (Q,E) plane defined by 

Q>^I(B;R\A), Q + E > H(B\B) 
contains the achievable rate region for quantum state redistribution. 

Proof Assume that Alice holds AB and Bob holds B. Alice wants to transfer her system AB to 
Bob. By the converse to FQSW (cf. |T]), transferring AB requires a rate pair (Q", E") such that 

Q" > \l{BA] R), Q" + E" > H(AB\B). (41) 

Now let us perform the redistribution successively: first transfer B and then A. Let the cost of 
transferring B be (Q,E), which we are trying to bound. By FQSW, the cost of transferring the 
remaining A once Bob has B can be achieved with the rate pair (Q 1 , E') such that 

Q' = h{A-R), Q' + E' = H(A\BB). 

If Q < \I{B;R\A), then Q + Q' < \I{bA;R), which contradicts gTJ. Hence Q > \l{B;R\A) 
must hold. Similarly, we can prove that Q + E > H(B\B). □ 

The bound Q + E > H(B\B) is the analogue of the classical bound R + C > H(Y\B) from 
Theorem l3.ll When A = null (simulated channel is the identity) the outer bound is achieved by the 
FQSW-based scheme and when B = null (no side information) it is achieved by the FQRS-based 
scheme. 

6 Discussion 

We have shown here a generalization of both the classical reverse Shannon theorem, and the 
classical-quantum Slepian-Wolf (CQSW) problem. Our main result is a new resource inequality 
l|36l) for quantum Shannon theory. Unfortunately we were not able to obtain it by naively combining 
the reverse Shannon and CQSW resource inequalities via the resource calculus of |10| . Instead 
we proved it from first principles. An alternative proof involves modifying the reverse Shannon 
protocol to "piggy-back" independent classical information at a rate of I(Y; B) (cf. |13|V In ^U] 
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certain general principles were proved, such as the "coherification rules" which gave conditions for 
when classical communication could be replaced by coherent communication. It would be desirable 
to formulate a "piggy-backing rule" in a similar fashion. 

An immediate corollary of our result is channel simulation with classical side information. 
Remarkably, this purely classical protocol is the basic primitive which generates virtually all known 
classical multi-terminal source coding theorems, not just the Wyner-Ziv result |22| . 

Regarding the state redistribution problem of Section 5, our results have inspired Devetak and 
Yard 16 to prove the tightness of the outer bound given by Proposition 15. II thus providing the 
first operational interpretation of quantum conditional mutual information. 

Acknowledgement This work was supported in part by the NSF grants CCF-0524811 and 
CCF-0545845 (CAREER). 

A Typicality and conditional typicality 

We follow the standard presentation of |3|. The probability distribution P x n defined by P x n(x) — 
N ( x ^ x ) is called the empirical distribution or type of the sequence x n , where N(x\x n ) counts the 
number of occurrences of x in the word x n = x\X2--.x n . A sequence x n 6 X n is called S -typical 
with respect to a probability distribution p defined on X if 



The set T™ s C X" consisting of all 8- typical sequences is called the 5-typical set. When the 
distribution p is associated with some random variable X, we may use the notation Tj? s . Observe 
that Eq. (14*2)) implies 



for some constant c depending only on p. Above, the distribution p n is naturally defined on X n by 
p n (x n ) = p(x 1 ) . . .p{x n ). 

Given a pair of sequences (x n ,y n ) <E X n x y n , the probability distribution P y ™\ x -n. defined by 



P xn (x) - p(x)\ < p(x)5, \/x e X. 



(42) 



The latter condition may be rewritten as 



Pxn e \p(i-S),p(i + 6)]. 




P, 



> y «\ x n(y\x) 



N{xy\x n y n ) = P x ^(x,y) 

N(x\x n ) P x n(x) 
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is called the conditional empirical distribution or conditional type of the sequence y n relative to 
the sequence x n . A sequence y n = y\ . . . y n £ y n is called 8 -conditionally typical with respect to 
the conditional probability distribution Q and a sequence x n = x\ . . . x n £ X n if 

P y n\ x n(y\x) £ [(l-6)Q(y\x),(l + 5)Q(y\x)}, Vx£X,My£y. 

The set of such sequences is denoted by Tq s (x n ) C When Q is associated with some conditional 
random variable Y\X, we may use the notation Ty\x Define q(y) = J2 X Q{y\ x )p( x )- 

Theorem A. 2 For all e > 0, 8 > 0, 8' > 0, and sufficiently large n, for all x n £ T^ s ,, 

^ 2-n[H(Y\X)+cS+c'S'] < Qn^yn^n^ < 2~n[H(Y |X)-c<5-c'«'] f or y n £ TQ S (x n ). 

2. Q n (T$ s (x n )\x n ) = Pr{Y n £ T$ 5 (x n )\X n = x n } > 1 - e 

3. (1 - e )2 n[H< - Y \ x ^ cS - c ' s ' ] < \T£ 5 {x n )\ < 2™[ ff ( y l x )+ c<5 + c '' 5 ']. 

I Ifv n e T& s (x n ), then (x n ,y n ) £ T^ t(s+Sl+6SI) , and hence y" G T^ {s+s , +ss , y 
5- Q n (T£ 5+s , +55 ,\x n )>l-e. 
for some constants c, c' depending only on p and Q. 
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